Skip to content

Conversation

@franz1981
Copy link

@franz1981
Copy link
Author

Bumping JCTools version is necessary due to some recent fixes around one of the used qs.

It's a shame we're exposing a blocking q API there, or we could've used JCTools/JCTools#340 to batch consume from the queue.

Copy link
Member

@jvz jvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a manual entry for this as well? With the other options for this plugin in https://github.com/apache/logging-log4j2/blob/master/src/site/asciidoc/manual/appenders.adoc somewhere

@vy
Copy link
Member

vy commented Apr 9, 2021

@franz1981, thanks so much for the contribution. Would you mind updating changes.xml too, please?

@carterkozak
Copy link
Contributor

Thanks for the contribution @franz1981, your PRs are appreciated as always!

I normally handle the changes.xml pieces when I merge PRs since it can be non-obvious (I've certainly gotten it wrong a few times, and I don't think it makes sense to make contributors learn our system).

@carterkozak
Copy link
Contributor

It's a shame we're exposing a blocking q API there, or we could've used JCTools/JCTools#340 to batch consume from the queue.

We could support a lighter weight API with the ability to utilize both BlockingQueue (retaining backwards compatibility) as well as JCTools/JCTools#340, what do you think? Alternatively a separate JCToolsAsyncAppender is an option.

@franz1981
Copy link
Author

franz1981 commented Apr 10, 2021

@jvz

Can you add a manual entry for this as well?

Done!

@carterkozak

Alternatively a separate JCToolsAsyncAppender is an option.

I would go into this direction instead :)
TBH the existing disruptor solution is a better fit in term of features set:

  1. FIFO (I suppose that's mandatory right? If we just need FIFO for the same thread, there are much faster options on JCTools ala https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpscCompoundQueue.java#L136
    ie using a fixed size of queues to distribute contention on the tail)
  2. Object pool: this is unbeatable
  3. batch-friendly: no idea how much this feature is used on log4j, but it can be rather important while appending files to batch writes syscalls (on FileChannel pwrite).

Thinking about a JCToolsAsyncAppender I still need to understand if being bounded is such a must have: I'm a great fan of backpressure, really, but with logging...is different.
I see different strategies that could be employed to satisfy different application needs and AsyncQueueFullPolicy already expose some of these: I still see a value in using an unbounded q especially if compared with policies that would change the logs order or block the application threads.

If the unbounded q path is feasable, on JCTools we have some option that perform great under heavy contention eg xadd qs.
The nice thing is that a "mild form" of backpressure is still possible with such alternativies by carefully choosing how to replicate the "object pooling" feature of the Disruptor and use the fixed capacity of such pool to limit the backlog on the unbounded q.

@carterkozak
Copy link
Contributor

Our fully async disruptor implementation is indeed very difficult to beat in benchmarks, largely for the reasons you've mentioned, also due to the tighter control of the threading model we get by integrating it directly into the logger system (See AsyncLoggerContext, AsyncLogger) rather than applying it to individual appenders (which requires defensive copies of LogEvent objects). This makes it more difficult to like-for-like compare a jctools implementation to the disruptor implementation.

  1. FIFO (I suppose that's mandatory right? If we just need FIFO for the same thread, there are much faster options on JCTools ala https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpscCompoundQueue.java#L136
    ie using a fixed size of queues to distribute contention on the tail)

Mandatory? That's difficult to say. In the general case I'd say so, however many services push events into complex logging pipelines which handle event ordering based on timestamps or some other identifier, so it doesn't matter so much what order events are produced in (depending on the system, of course!)

  1. Object pool: this is unbeatable

Agreed, this is a big one for us.

  1. batch-friendly: no idea how much this feature is used on log4j, but it can be rather important while appending files to batch writes syscalls (on FileChannel pwrite).

The default configuration doesn't take advantage of this property, however configuring most appenders immediateFlush property allows them to wait until a LogEvent with endOfBatch=true before flushing, in my experience this increases logging throughput by an order of magnitude.

This reminds me, I had meant to research the feasibility of something like libaio for logging file i/o, I recall that you may have some experience with the java bindings? ;-) I imagine eventually io_uring may allow us to batch appends to multiple files into a single syscall, this could be huge as it's common to have some subset of events recorded to multiple files, but I admittedly haven't read as much about how this works in practice as I should.

I still see a value in using an unbounded q especially if compared with policies that would change the logs order or block the application threads.

Thar be dragons! It would be remarkably easy to OOM with an unbounded queue if used incorrectly -- in production systems it's often safer to increase the ringbuffer size and use a discarding policy when the queue is filled, but I don't mind allowing the option for an unbounded queue (on AsyncAppender one can already configure a LinkedBlockingQueue with Integer.MAX_VALUE capacity!). I do like the idea of a queue with a dynamic footprint, at work I use a large constant size ringbuffer which accounts for a large chunk of system memory utilization regardless of load or expectations for the service.

@franz1981
Copy link
Author

franz1981 commented Apr 11, 2021

This reminds me, I had meant to research the feasibility of something like libaio for logging file i/o, I recall that you may have some experience with the java bindings? ;-) I imagine eventually io_uring may allow us to batch appends to multiple files into a single syscall, this could be huge as it's common to have some subset of events recorded to multiple files, but I admittedly haven't read as much about how this works in practice as I should.

libaio need some JNI, as io_uring, but IMO io_uring is to be preferred because more modern. If it worths I've already implemented a version of libaio that minimize the usage of JNI (in favour of Unsafe now and VarHandle/Foreign Access in the future) ie apache/artemis-native#10
and that can be used (with the right changes) as POC to see if it makes sense. Given the nature of being write-only, probably a logger library is a good fit, hence being able to NOT trigger the OS page cache can bring some benefit.

@franz1981
Copy link
Author

@carterkozak do we have some benchmark to run this PR vs the baseline and see if it worths?
Sorry to ask directly, but I don't have any decent HW but my laptop at hand in this very moment..

@carterkozak
Copy link
Contributor

Of course, I’d be happy to run the benchmarks! Unfortunately I’m not sure I’ll be able to get to it this week with other things going on.

@carterkozak
Copy link
Contributor

Sorry for the delay, @franz1981!

This PR rebased to master

java -jar log4j-perf/target/benchmarks.jar ".*AsyncAppenderLog4j2Benchmark\.throughputSimple*"

Benchmark                                                                   (configFileName)   Mode  Cnt        Score        Error  Units
AsyncAppenderLog4j2Benchmark.throughputSimple           perf5AsyncApndNoLoc-noOpAppender.xml  thrpt    4  1454918.276 ± 233834.928  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple     perf5AsyncApndDsrptrNoLoc-noOpAppender.xml  thrpt    4  2386789.891 ±  26025.458  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndMpscQNoLoc-noOpAppender.xml  thrpt    4  2754800.649 ±  27636.937  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple  perf5AsyncApndMpscParkQNoLoc-noOpAppender.xml  thrpt    4  2172734.114 ±  33748.888  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndXferQNoLoc-noOpAppender.xml  thrpt    4  2360362.123 ±  54748.437  ops/s

With more threads:
java -jar log4j-perf/target/benchmarks.jar ".*AsyncAppenderLog4j2Benchmark\.throughputSimple*" -t 28

Benchmark                                                                   (configFileName)   Mode  Cnt        Score        Error  Units
AsyncAppenderLog4j2Benchmark.throughputSimple           perf5AsyncApndNoLoc-noOpAppender.xml  thrpt    4    98325.087 ±  14675.006  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple     perf5AsyncApndDsrptrNoLoc-noOpAppender.xml  thrpt    4  1351814.397 ± 283184.236  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndMpscQNoLoc-noOpAppender.xml  thrpt    4  1337458.477 ± 686440.041  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple  perf5AsyncApndMpscParkQNoLoc-noOpAppender.xml  thrpt    4  1934682.312 ± 157883.496  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndXferQNoLoc-noOpAppender.xml  thrpt    4  1037054.911 ± 317250.949  ops/s

Master without this PR

java -jar log4j-perf/target/benchmarks.jar ".*AsyncAppenderLog4j2Benchmark\.throughputSimple*"

Benchmark                                                                   (configFileName)   Mode  Cnt        Score        Error  Units
AsyncAppenderLog4j2Benchmark.throughputSimple           perf5AsyncApndNoLoc-noOpAppender.xml  thrpt    4  1464295.244 ± 196064.898  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple     perf5AsyncApndDsrptrNoLoc-noOpAppender.xml  thrpt    4  2196993.122 ± 732245.757  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndMpscQNoLoc-noOpAppender.xml  thrpt    4  2542884.245 ± 233213.133  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple  perf5AsyncApndMpscParkQNoLoc-noOpAppender.xml  thrpt    4  3173527.805 ± 107428.527  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndXferQNoLoc-noOpAppender.xml  thrpt    4  2179666.490 ±  83951.441  ops/s

With more threads:
java -jar log4j-perf/target/benchmarks.jar ".*AsyncAppenderLog4j2Benchmark\.throughputSimple*" -t 28

Benchmark                                                                   (configFileName)   Mode  Cnt        Score         Error  Units
AsyncAppenderLog4j2Benchmark.throughputSimple           perf5AsyncApndNoLoc-noOpAppender.xml  thrpt    4   105100.502 ±   17273.746  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple     perf5AsyncApndDsrptrNoLoc-noOpAppender.xml  thrpt    4  1075800.634 ±  278110.477  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndMpscQNoLoc-noOpAppender.xml  thrpt    4  1517297.622 ±  180276.994  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple  perf5AsyncApndMpscParkQNoLoc-noOpAppender.xml  thrpt    4  3613600.519 ±  102724.103  ops/s
AsyncAppenderLog4j2Benchmark.throughputSimple      perf5AsyncApndXferQNoLoc-noOpAppender.xml  thrpt    4  1100745.644 ± 2014574.543  ops/s

Benchmark Implementation

Draft PR with my benchmark changes used for comparison here: #556

@jvz jvz added the performance Issues or PRs that affect performance, throughput, latency, etc. label Jan 25, 2023
@vy vy deleted the branch apache:master February 28, 2023 15:02
@vy vy closed this Feb 28, 2023
@ppkarwasz
Copy link
Contributor

This was closed automatically by Github because we renamed the release-2.x branch to 2.x. Feel free to resubmit to the 2.x branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Issues or PRs that affect performance, throughput, latency, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants